WOMBAT 2025 Tutorial

Visualising Uncertainty

Harriet Mason, Dianne Cook

Department of Econometrics and Business Statistics

Session 2
Diving deeper into uncertainty visualisation using examples in spatial data

Introduction to Spatial Visualisation

Why focus on spatial visualisations?

  • Spatial case is a good example to work through because the aesthetics we have to express estimates are limited
  • Maps take up most of the usual aesthetics by being a representation of space
    • position, size, shape, etc all have an implicit meaning in the mapping context
    • colour/fill is usually the only aesthetic we have left
    • can also get creative and do glyph maps (we will ignore this variation here)
  • Once we have filled in a map, colour/fill is often the only aesthetic that has

Citizen Scientist Data

  • There have been reports of a strange spatial pattern in the temperatures of Iowa
  • We get some citizen scientists to measure data at their home and report back
  • To maintain anonymity, we are only provided with the county of each scientist
scientistID county_name recorded_temp
#74991 Lyon County 21.1
#22780 Dubuque County 28.9
#55325 Crawford County 26.4
#46379 Allamakee County 27.1
#84259 Jones County 34.2

990 citizen scientists participated

We could just plot the data…

  • We often get spatial data in terms of longitude and latitude which we can plot directly
  • This approach is easy but lacks the contextual information that gives our plots meaning.

Spatial features objects

  • SF objects are differentiated from a tibble because of additional metata in the Coordinate reference system (CRS). Specifically:
    • Assumptions about the shape of the planet (geodetic datum)
    • Distortions we will/won’t accept when drawing the map (map projection)

Can you see the spatial trend?

Estimate the county mean

  • Visualising an estimate, such as a mean, can make trends easier to see
    • Should use the sampling distribution, but often we do not bother…
Code
# Calculate County Mean
toy_temp |> 
  group_by(county_name) |>
  summarise(temp_mean = mean(recorded_temp),
            temp_se = sd(recorded_temp)/sqrt(n()),
            n = n()) 
county_name temp_mean temp_se n
Adair County 29.7 0.907 6
Adams County 29.6 1.003 9
Allamakee County 26.3 0.550 8
Appanoose County 22.8 0.831 14
Audubon County 27.6 0.893 11

Can you see the trend now?

Common Map Visualisations

  • Usually spatial data is shown using a choropleth map
    • Choropleth maps shade an area according to an average or total
  • We can also weight according to a different variable (such as sample size)
    • e.g. Cartograms, and Bubble plots

But what if the error is worse?

  • It turns out the citizen scientists are using some pretty old tools
  • The standard error could be up to three times what we would estimate with our usual assumptions.
  • We want to see both versions of the data so we can see the impact of this measurement error
county_name temp_mean low_temp_se high_temp_se n county_geometry
Adair County 29.7 0.907 2.72 6 MULTIPOLYGON (((441130 -374...
Adams County 29.6 1.003 3.01 9 MULTIPOLYGON (((424556 -414...
Allamakee County 26.3 0.550 1.65 8 MULTIPOLYGON (((675217 -131...

Spot the difference

  • One of these plots was made with the high standard error data, and the other was made with the low standard error data. Can you tell which is which?

Exercise 1

Make the high and low variance choropleth maps yourself, and see why they come out looking identical

Approaches to Spatial Uncertainty

Solution 1: add an axis for uncertainty

  • Pro
    • Included uncertainty and increased transparency
  • Cons
    • High uncertainty signal still very visible
    • 2D palette is harder to read
      • Colour is not a simple 3D space
      • Using saturation hurts accessibility

Solution 2: blend the colours together

  • Pros
    • Included uncertainty and increased transparency
    • Removed false signals
  • Cons
    • Still have 2D Colour palette
    • Standard error at which to blend colours is made up
      • Impossible to align with hypothesis testing

Solution 3: simulate a sample

  • Pros
    • Included uncertainty
    • High uncertainty interferes with reading of plot (?)
    • 1D colour palette

Making a Pixel Map with ggdibbler

A ggdibbler example

Estimate the county distribution

  • Visualising an estimate, such as a mean, can make trends easier to see
    • Should use the sampling distribution, but often we do not bother…
county_name temp_dist_low temp_dist_high n
Adair County N(30, 0.82) N(30, 7.4) 6
Adams County N(30, 1) N(30, 9.1) 9
Allamakee County N(26, 0.3) N(26, 2.7) 8
Appanoose County N(23, 0.69) N(23, 6.2) 14
Audubon County N(28, 0.8) N(28, 7.2) 11

Using ggdibbler to plot a Distribution

Code
toy_temp_dist |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill=temp_dist_low))

Code
toy_temp_dist |> 
  ggplot() + 
  geom_sf_sample(aes(geometry = county_geometry,
                     fill=temp_dist_high))

Can utilise ggplot2 flexibility

Can utilise ggplot2 flexibility

Remember, the plot is random

Exercise 2

Here is the code that was used to make the cartogram from earlier in the session. Can you make a ggdibbler verion of this plot?

Code
# Transform to a the crs needed to do the cartogram transformation
toy_merc <- st_transform(toy_temp_mean, 3857)
# cartogram transformation
toy_cartogram <- cartogram_cont(toy_merc, weight = "n", itermax = 5)
# Transform back to original crs 
toy_cartogram <- st_transform(toy_cartogram, st_crs(toy_temp_mean))

# Plot cartogram using ggplot2
ggplot(toy_cartogram) +
  geom_sf(aes(fill = temp_mean), linewidth = 0, alpha = 0.9) +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Solution

Code
# only change to data is distribution
toy_cartogram |>
  mutate(temp_dist = distributional::dist_normal(temp_mean, temp_se^2)) |>
  ggplot() +
  geom_sf_sample(aes(geometry=county_geometry, 
                     fill=temp_dist), linewidth=0) +
   geom_sf(aes(geometry=county_geometry), fill=NA, colour="white") +
  theme_minimal() +
  scale_fill_distiller(palette = "YlOrRd", direction= 1) +
  xlab("Longitude") +
  ylab("Latitude") +
  labs(fill = "Temperature") +
  theme(aspect.ratio=0.7)

Where to learn more

End of session 2

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.